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ABSTRACT 



In the present invention a predetermined number of bits 
are added to each entry in the process table. These bits 
are used to indicate the warmth of the cache with re- 
spect to the particular schedulable unit such as a process 
or thread of a process. The scheduler will then review, 
not only the priority of the schedulable unit, but the 
warmth of the cache in order to determine the schedula- 
ble unit to be scheduled next with respect to a particular 
processor. For example, these cache warmth bits may 
be used to identify the processor the schedulable unit 
previously executed on such that the scheduler will 
only schedule the schedulable unit with the processor 
previously executed on in order to take advantage of 
the schedulable unit data located in the the cache associ- 
ated with the processor. The system may be extended to 
provide more sophisticated models for determining 
cache warmth and the scheduling of processes and pro- 
cess threads. 

21 Claims, 7 Drawing Sheets 
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AT TIME tO: 

PROCESSOR 1 processors 

CACHE CONTENTS: [ ] [ ] 

PROCESS CONTEXT: ( ] ( ] 

IN PROCESS QUEUE: A,B,C,D,E 

Figure la 

AT TIME tl: 

PROCESSOR 1 PROCESSOR 2 

CACHE CONTENTS: A 
PROCESS CONTEXT: A 

IN PROCESS QUEUE: B.C.D.E 

Figure lb 

AT TIME t2: 

PROCESSOR 1 PROCESSOR 2 

CACHE CONTENTS: A B 

PROCESS CONTEXT: A B 

IN PROCESS QUEUE: C,D,E 

Figure lc 

AT TIME t3: 

PROCESSOR 1 PROCESSOR 2 

CACHE CONTENTS: A/C B 

PROCESS CONTEXT: C B 

IN PROCESS QUEUE: D.E.A 

Figure Id 

AT TIME T4: 

PROCESSOR 1 PROCESSOR 2 



CACHE CONTENTS: A/C B/D 

PROCESS CONTEXT: C D 



Figure le 

(Prior Art) 
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Figure 3 
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Figure 4b 
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/* Scheduler: Makes use of cache warmth to schedule efficiently. 

This pseudo code runs on a given processor and decides which process to. rut 

*/ 

static warmthjt>uff=0, last_warmth_buff=0; 
6tatic sticky.mode; 



schedule(proc_list_ptr) 

struct *proc iist_ptr[]; 

{ 

intid; 

int proc, next_proc; 



if (sticky_mode == 1) 
{ 

sticky_mode = 0; 

return; / * to keep active process running * I 

) 

park(active_process); 

get_warmth(&warmth_buff); /* get warmth value of last running process *, 
get jprocessor_id(&id); /* get processor the scheduler is running on * 

if (check_thrash(warmthj)uff, last_warmth_bufO) 

sticky _mode= 1; /* reset scheduling interval * 2*1 

active_process.warmth += warmth_buff; 
for (proc in *proc_list_ptr) 
( 

if (proc_list_ptr[proc.].last_id == id) /* ran on same processor last time 
if (max < (prod_list_ptr[proc].warmth -= warmth_buft)) 
( 

next_proc = proc; 

max = prodJist_ptr[proc].warmth; 

) 

) 

add_parked_process(proc Jist_ptr); /* adds process F to the process list 

unpark_process(proc_list_ptr, next.proc); J* gets new process to run and rum 
last_warmth_bufT = warmth.buff; /* update the lastjwarmth value * 

return; 

) 

check_thrash(new,last) 

int new^ast; 

{ 

if ((new > threshold) 
&& 

(last > threshold)) 
return 1; 
dse 

return 0; 

} Figure 6 
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METHOD AND APPARATUS FOR EFFICIENT 
SCHEDULING IN A MULTIPROCESSOR SYSTEM 

BACKGROUND OF THE INVENTION 5 

1. Field of the Invention 

The present invention relates to a method and appara- 
tus for improving the efficiency of process scheduling in 
a multiprocess system. 

2. Art Background 10 
In a time sharing computer system the CPU is allo- 
cated to a process for a pre-deterrnined period of time 
called a time-slice or time quantum at the end of which 
the process is pre-empted and a second process is sched- 
uled to begin at the start of the first new time-slice. The 15 
process preempted is then rescheduled to continue exe- 
cution at a later time-slice. Process scheduling tech- 
niques are employed to determine the order in which 
processes have access to the CPU. 

Process scheduling techniques have been extended to 20 
multiple-CPU computer systems, Processes are allo- 
cated a time-slice according to the CPU available. A . 
process table is maintained which identifies each pro- 
cess to be executed. Each process table entry identify- 
ing a process contains a priority field for a process 25 
scheduling. For example, the priority of a process may 
be a function of the amount of its CPU usage with pro- 
cesses getting a tower priority if they have recently 
used the CPU. A process scheduler accesses the process 
table information and controls which processes are 30 
allocated the usage of the CPU. For information on 
process scheduling see The Design of the UNIX® Oper- 
ating System, Maurice J. Bach, pages 247-258 (Prentice- 
Hall, Inc., 1986) and Operating System Concepts, 3rd 
Ed., Silber Schetz, Peterson and Galvin, pages 97-125 35 
. (Addison-Wesley, 1991). 

Typically in a multiple CPU system, the scheduler 
will allocate the next available CPU to the process 
having the highest priority for scheduling. However, as 
the multiple process systems become more sophisti- 40 
cated, other factors must be considered in scheduling 
processes to achieve the best results. In particular, in a 
multiple CPU system, cache memories are now allo- 
cated to each CPU. Applying currently known schedul- 
ing techniques results in poor usage and efficiency of 45 
the cache memories. This is illustrated with respect to 
FIG. 1. FIG. la shows at time TO, there are five pro- 
cesses in the process queue indicating those processes 
are ready to be executed: A, B, C, D, and E. Since no 
processes have been executed at time TO, the cache 50 
contents and the process context for each processor are 
empty. 

Referring to FIG, lb at time Tl, the first process is 
allocated to the first processor, Processor 1. Thus, the 
process context currently executing on Processor 1 is 55 
Process A and contents of the cache contain data re- 
lated to Process A. At time T2, referring to FIG. lc, the 
next process of highest priority is allocated to execute 
on Processor 2. Therefore, the process context of Pro- 
cessor 2 is Process B and the contents of the cache 60 
contain data related to Process B. At time T3, as shown 
in FIG. Id, a context switch is performed wherein Pro- 
cess A is swapped out from the CPU and the process of 
highest priority, Process C, is swapped in to be executed 
by Processor 1. Thus, the context of Processor 1 is 65 
Process C. After some execution of Process C, the 
cache contents will contain data related to Process C as 
well as pre-existing data located in the cache related to 
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Process A. At time T4, referring to FIG. le, a context 
switch is performed on Processor 2, wherein Process B 
is swapped out and Process D, the next process to be 
executed, is swapped in. Thus, Processor 2 is executing 
Process D and the cache contents of the cache memory 
associated with Processor 2 contains a mixture of data 
related to Process B and Process D. Continuing the 
pattern, it is evident that Process E will be scheduled on 
Processor 1 and Processor 2 will pick Process A from 
the run queue to execute next. This exposes a critical 
flaw in extending current scheduling algorithms to mul- 
tiple CPU systems. In particular, current scheduling 
algorithms do not account for performance penalty of 
process shuffling among multiple processors. This pen- 
alty results from a "cold start" of the cache on the new 
processor which could avoided by scheduling the pro- 
cess on a CPU whose cache already contains data asso- 
ciated with the process. A method of scheduling which 
weighs this penalty in the scheduling algorithm would 
greatly improve the performance. For purposes of the 
following discussion, a cache is said to be cold relative 
to a particular process when it contains little or no data 
required for the execution of that process and accesses 
to the cache will miss. A cache is said to be warm with 
respect to a particular process when it contains data 
required for the execution of the process and accesses to 
the cache will hit. 

Referring to FIG. 1/ the pattern of processor alloca- 
tion can be extended to the processors so that over time 
a history of processors each process executes on can be 
determined. Note that Process A, which previously ran 
on Processor 0 will be restarted on Processor 1 and will 
not execute on Processor 0 until two processes (C and 
E) have previously executed on Processor 0. This en- 
sures that much of the data from Process A in Processor 
0's cache will have been replaced with data from Pro- 
cesses C and E. If the scheduling interval is approxi- 
mately equal to the time it takes for the executing pro- 
cess to fill half of the cache, it can be seen that each 
process executes at best from a half-full cache when one 
or more intervening process have run on the CPU. 
Rescheduling takes place at every other time interval, 
such that the rescheduling executions are out of phase 
with one another. 

SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to 
provide a scheduling algorithm to account for the per- 
formance penalty of process shuffling among multiple 
processors and resources. 

It is an object of the present invention to provide a 
system which schedules processes according to the 
priority of the process as well as the cache warmth of 
the cache associated with a particular processor. 

In the present invention a predetermined number of 
bits are added to each entry in the process table. These 
bits are used to indicate the warmth of a processor's 
resource, such as a cache, with respect to a schedulable 
unit, such as a particular process or thread of a process. 
The scheduler will then review, not only the priority of 
the schedulable unit, but the warmth of the cache in 
order to determine the schedulable unit to be scheduled 
next with respect to a particular processor. For exam- 
ple, these cache warmth bits may be used to identify the 
processor the schedulable unit previously executed on 
such that the scheduler will only schedule the schedula- 
ble unit to run on the processor it previously executed 
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on, in order to take advantage of the data (e.g., process 
instructions and process data) located in the cache asso- 
ciated with the specific processor. The system may be 
extended to provide more sophisticated models for 
determining cache warmth and the scheduling of 5 
schedulable units, such as processes and threads of pro- 
cesses. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The objects, features and advantages of the present 10 
invention will be apparent from the following detailed 
description in which: 

FIG. lfl-1/illustrate the prior art process of schedul- 
ing. 

FIG. 2 illustrates the preferred embodiment of the 15 
system of the present invention. 

FIG. 3 illustrates a process table containing the 
warmth bits used by the scheduler to schedule processes 
in a multiple process, multiple cache system. 

FIG. 4o and 46 illustrate embodiments of a process 20 
table containing cache warmth bits used by the sched- 
uler to schedule processes in a multiple process, multi- 
ple cache system. 

FIG, 5 illustrates exemplary logic implementing a 
preferred embodiment of the present invention. 25 

FIG, 6 is exemplary scheduling code which utilizes 
cache warmth to schedule efficiently. 



DETAILED DESCRIPTION OF THE 
INVENTION 



30 



FIG. 2 illustrates a system for implementing the pro- 
cess of the present invention. A plurality of processes 
are executed by a plurality of processors 10, 20. Each 
processor 10, 20 has a cache 30, 40 associated with it. 
The scheduler 50 determines which process to assign to 35 
an available processor. The process table 60, FIG. 3, 
contains data regarding each active process including 
the system level context of the process, the virtual ad- 
dress memory management information and, in the 
present invention, status bits indicative of the warmth of 40 
the cache associated with the processor upon which the 
process was executed. By using priority information as 
well as cache warmth, scheduling queues 70 may be 
utilized to identify the order of processes to be exe- 
cuted. The number of bits required to indicate cache 45 
warmth is dependent upon the extent of information 
desirable. For example, in one embodiment, a number of 
bits is used to identify the processor number of the 
processor upon which the process was last executed. 
This similarly will indicate the cache which contains 50 
data related to the process or the cache which most 
likely will contain data related to the process. Further 
information such as a count of cache misses or a warmth 
count may also be represented to allow for more effi- 
cient scheduling algorithms. 55 

FIG. 4a illustrates one embodiment of the present 
invention. Referring to FIG. 4a, if Process A was previ- 
ously executed on Processor 1, the cache warmth bits 
may identify a value of 1 indicative of Processor 1. 
Similarly, if Process B executed previously on Proces- 60 
sor 2, the binary value 10 identifies that Process B was 
executed by Processor 2. It follows that if Process C 
was executed by Processor 3 a value of 11 would be 
indicative of the cache warmth of the cache associated 
with Processor 3. Thus, the scheduler will schedule 65 
Process A with Processor 1 in order to minimize the 
number of cache misses and maximize usage of the data 
associated with Process A already contained in Proces- 



sor l's cache, which was previously stored in the cache 
during the earlier time-slice Process A was executed by 
Processor 1. 

FIG. 4b illustrates an alternate embodiment of the 
present invention wherein the cache warmth bits not 
only identify the processor which executed the process, 
but the number of time-slices prior that the process was 
executed by a particular processor. Thus, for example, if 
Process A was executed by Processor 1 at time-slice 1, 
Process B was executed by Processor 2 at time-slice 2, 
Process C was executed by processor 3 at time-slice 3 
and Process D was executed by Processor 1 at time- 
slice 4, the status bits at the end of time-slice 4 may look 
like that shown in FIG. 46. In particular, Process D was 
executed by Processor 1, indicated by left-most binary 
value 01, at the current time-slice, indicated by the 
second binary value 00. Similarly, the status bits associ- 
ated with Process A indicate that Process A was previ- 
ously executed by Processor 1 (as indicated by the bi- 
nary value 01) and was executed 3 time-slice periods 
previously (as indicated by the binary value 11). The 
scheduler may utilize this information to schedule Pro- 
cess A at the next time-slice to Processor 1 because 
Process A, as indicated by the cache warmth status bits 
for Process A located in the process table, was previ- 
ously executed on Processor 1 and has been waiting the 
longest period of time (as indicated by the second bi- 
nary value) for the availability of Processor 1, Similarly, 
the scheduler may utilize this information to maximize 
CPU utilization and cache warmth by scheduling the 
more recent processes executed by the processors more 
frequently and scheduling the processes least used only 
when a predetermined number of time-slices have 
passed. Such a scheduling scheme would have a similar 
effect to increasing the time-slice duration, since the 
maximum latency between execution intervals in- 
creases. 

The cache warmth bits can also indicate a value to be 
used by the scheduler which indicates a number of 
cache misses that have occurred for any given process. 
This may then be used to perform scheduling opera- 
tions. For example, the lower number of cache misses 
for a given process, the more likely the cache contains 
data related to that process. Thus, it may be more effi- 
cient to give a higher priority to that process while the 
cache contains process relevant data and lower priority 
to other processes which will incur cache misses when 
executed (because process related data is not currently 
located in the cache) and will therefore require time 
consuming memory operations in order to update the 
cache. 

Furthermore, the scheduler may utilize the cache 
warmth information to perform load balancing, that is, 
the scheduler may schedule processes which require 
maximum CPU usage to a single processor while group- 
ing those processes which are I/O intensive but not 
CPU intensive to other processors. This information 
can be derived for example, from information in the 
process table about whether the process was pre- 
empted or blocked on I/O when rescheduling occurred, 
the number of time slices passed and the number of 
times the processor was allocated to a process. 

For example, if two processes A and B are executing 
on Processor 4> and Process A is CPU intensive (70% 
CPU usage) and Process B is I/O intensive and not 
CPU intensive (e.g., a keyboard program, 30% CPU 
usage), Process A will be granted access to Processor <p 
more frequently because Process B requires little CPU 
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time. If Process C is allocated to Processor 2 and simi- increase CPU utilization which would otherwise suffer 
larly is I/O intensive and not CPU intensive (e.g., 20% during thrashing. 

CPU usage), the scheduler will be able to determine A simple example of comparator logic is set forth in 
from the cache warmth bits and CPU utilization infor- FIG. 5. The cache 500 comprises an address tag 510 and 
mation in the process table (e.g., the amount of time the 5 data 520, 525, 530, 535. The tag information generated 
process has been executing) that Processor 2 is under- from the CPU address is compared to the tag from the 
utilized and it would be more efficient to allocate Pro- cache and, if they are equal, a cache hit is noted and the 
cess B to Processor 2 whereby Process A can have sole data from the cache is extracted through the multiplex- 
access to Processor 1. ors 570, 575. If the tags arc not equal, a cache miss 
A problem which sometimes arises in heavily loaded 10 occurs and the counter 560 tracking the misses determi- 
multiprocessing environments is resource thrashing. native of cache warmth is incremented and the cache is 
Thrashing of the cache occurs when processes execut- updated with data read from memory (not shown). This 
ing replace a substantial portion of the processor's logic also provides a simple means to read the counter 
cache before the next process runs on that processor. 560 by tracking it as an additional cache. A predeter- 
The next process scheduled on that processor will incur IS mined address is used to address the counter. A prcde- 
numerous cache misses and also replace a substantial tennined address is supplied to the cache and the cache 
portion of the processor's cache entries with its process supplies the value of the counter, 
specific data. When the first process is subsequently Preferably, the mechanism which tracks cache 
rescheduled, to the same processor and thus, to the warmth, e.g., counter 560 (FIG. 5), does not record a 
same cache, a number of cache misses are incurred and 20 miss, and therefore increase the cache warmth value, 
the cache entries are again replaced with data related to when the process is replacing its own data as this re- 
the first process. Because the time-slice interval is fixed placement is not indicative of the change of state of 
and cache misses require the CPU to wait idly for data cache warmth. Therefore, in an alternative embodi- 
from memory, CPU utilization decreases dramatically ment, the process context of the line of the cache to be 
when cache thrashing takes place. Thus, it is preferred 25 replaced is compared to the context of the currently 
that the scheduler take into account the problem of executing process. If the contexts are associated with 
thrashing when scheduling processes. In particular, the the same process, (e.g., the contexts are equal), the 
cache warmth bits may be used to detect the presence of counter is not incremented. If, however, they are differ- 
thrashing. For example, the cache warmth for a sched- ent, then the line of memory being placed in the cache 
uled process is compared to a cache warmth threshold 30 will alter the warmth of the cache. Once the counter 
value indicative of thrashing. Thrashing is found to exist associated with a segment has been determined, the 
if two successive processes exceed the threshold value. method of comparing one scheduling context process of 
Once thrashing is detected, the scheduling of processes the line being replaced and the current executing sched- 
can be modified to eliminate some of the deleterious uling context (e.g., executing process) process may be 
effects of cache thrashing. For example, if thrashing is 35 employed as outlined above. 

detected, the duration of time a process executes on a The ability to model the degree to which the process 
given processor can be increased by restricting resched- executing on a particular CPU modifies the resources 
uling operations to occur at every other time interval, (cache or other) associated with that CPU is limited by 
(See code in FIG. 6 for example.) the counter implementation which tracks the resource. 

Although in the preferred embodiment the cache 40 Ideally, the counter would be able to differentiate the 
warmth bits are stored in the process table for easy processes at the finest level of granularity, i.e., the 
access, the information regarding scheduling efficiency smallest unit manipulated by the scheduling algorithm, 
may incorporate other areas of the kernel and hardware For example, in a multiple processor system executing 
such as the MMU or a separate or specified portion of multi-threaded processes, the counter would use knowl- 
kernel memory. For example, information regarding 45 edge of the current thread of execution and information 
cache warmth for each process may be stored in the about the thread of execution of the line being replaced 
process description block or process table. Further- in the cache to determine whether or not to increment 
more, the concept of cache warmth can be extended to the counter, In this scheme the cache tags must contain 
instead track page faults which occur in a virtual mem- added information, the unique ID that specifies the 
ory system or other types of systems in which shared 50 thread of execution or which group of threads to which 
resources maintain process specific data. the line belongs. 

The scheduler is modified slightly to include logic The cache and scheduling algorithm can be modified 
which utilizes the cache warmth bits to determine the to utilize the technique of cache coloring. Coloring 
priority of processes and the impact that scheduling a refers to segmenting the cache into regions and using a 
particular process would have in terms of CPU utiliza- 55 predetermined number of bits of the virtual address to 
tion. This logic may be implemented in hardware or generate a hash value which selects the cache region to 
software or a combination of both. For example, in which a process address maps. Coloring restricts differ- 
hard ware, a comparator is employed to compare the ent processes to different segments of the cache, and 
status bits identifying the processor ID upon which the thus reduces aliasing of several addresses from different 
process last executed to the processor ID of the next 60 processes to the same line. In an embodiment which 
processor available to be scheduled. If the IDs do not uses cache coloring, separate counters are maintained to 
match, the process will not be scheduled to that Proces* track the cache warmth values in a cache segment. The 
sor but will wait to be scheduled upon the processor for bits used to generate a hash value also distinguish the 
which the IDs do match. FIG. 6 is an example of sched- counters between processes. Once the counter associ- 
uler code which uses cache warmth to schedule effi- 65 ated with a segment has been identified, the method of 
ciently and decide which process to run next. This comparing contexts of the line being replaced and the 
scheduler also determines if thrashing is occurring, and process currently executing may be employed as previ- 
if so, schedules the same process twice in succession to ously discussed. Cache coloring will minimize the effect 
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of a single process destroying all data in the cache by cesses (each with a single thread of control) concur- 

restricting its addresses to map to only one segment of rently. It is important to note that each process runs in 

the cache. Thus, a single process will only affect the one its own memory space and contains a single thread of 

region of the cache it maps to, rather than the entire execution. Thus, concurrent process execution is the 

cache. 5 finest grain of parallelism achievable in a multi-proces- 

In an alternative embodiment not only the warmth sor environment with singly-threaded processes, 
value for the current process is updated, but also the A multi-threaded UNIX process contains several 
warmth values for all other processes that previously threads of control. Each thread in a multi-threaded 
executed on the same processor are updated. This pro- process consists of the sequence of instructions being 
vides greater accuracy in the cache warmth measure- 10 executed by that particular thread and a collection of 
ment, The technique operates as follows: after the pro- state variables that are unique to the thread. Thus, each 
cess has run, the current process's cache warmth is thread contains its own PC and SF variables. Multiple 
updated by an increment equal to the number of cache threads allow for parallelism and concurrent execution 
misses which occurred. The cache warmth values for within a process when more than one processor is avail- 
all the other remaining processes which ran on the same 15 able. A multi-processor system in which memory is 
processor are decremented by an amount proportional shared by all processors can execute different threads 
to their respective current cache warmth value. This is (from one or multiple processes) concurrently. It is 
illustrated by the example below: important to note that each process runs in its own 
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At time Tl Processes A, B, C, D respectively have 
cache warmth values of 50, 10, 20, 20 for a total cache 
warmth value of 100 process E is currently executing. 30 
At time T2 Process E (a new process) has run, resulting 
in a cache warmth value of 70 and a new process is 
selected for execution. The cache warmth values are 
proportionally adjusted according to the following 
equation: new cache warmth = old cache warmth — (old 35 
cache warmth * current process cache warmth/total 
cache warmth). Using these new values for warmth, the 
next process to run is selected. Assuming all processes 
have equal priority, Process A is selected to run at time 
T2. 40 

Similarly, at time T3, process A finishes running with 
a cache warmth value of 90. The cache warmth values 
for processes B,C,D,E are proportionally decremented 
by the amount such that the total number of misses 
distributed among the processes is equal to the in- 45 
creased warmth assigned to the process previously exe- 
cuted. It follows that process E, which had the greatest 
proportion of the cache would similarly lose the great- 
est amount. The computed cache warmth values accu- 
rately reflects this. 50 

While the invention has been described in conjunc- 
tion with the preferred embodiment, it is evident that 
numerous alternatives, modifications, variations and 
uses will be apparent to those skilled in the art in light of 
the foregoing description. In particular, the invention 55 
described herein can be utilized with a variety of types 
of resources in order to maximize efficient usage of the 
resource, Further, a variety of types of processors may 
be employed. In addition, the present invention applies 
not only to processes but to the scheduling of any 60 
schedulable unit. An example of another schedulable 
unit is a thread of a process. 

A traditional UNIX process contains a single thread 
of control. This thread consists of the sequence of exe- 
cuting instructions along with a minimal amount of state 65 
variables such as a program counter (PC) and stack 
frame (SF). A multi-processor system in which memory 
is shared by all processors can execute different pro- 



memory space as before, but now multiple threads of 
execution share the same memory space, thus the need 
for unique state variable (PC and SF among others) for 
each thread. Therefore, concurrent thread execution is 
the finest grain of parallelism achievable in a multi- 
processor environment with multiply-threaded pro- 
cesses. For further information see, Powell et al., 
"Sunos Multi-Thread Architecture," USENIX, Winter, 
1991. 

From the above discussion, it should be clear that a 
multiple processor system running only one single- 
threaded process will take the same amount of execu- 
tion time as a single processor system because there is 
no means for parallel execution. However, multi- 
threaded processes will be able to exploit concurrent 
thread execution, and thus, make use of several proces- 
sors to speed execution of the process. 

The present invention is therefore applicable to any 
system with multiple threads of execution executing 
concurrently. These threads may consist of several 
singly-threaded processes or threads from one or more 
multi-threaded processes or a mixture of singly- 
threaded and multi-threaded processes. 

What is claimed is: 

1. In a computer system comprising multiple proces- 
sors and at least one resource associated with, and cou- 
pled to, each processor, each processor executing in the 
context of a schedulable unit for a time slice before 
switching to execute in the context of another schedula- 
ble unit, an apparatus for efficiently scheduling schedu- 
lable units to be executed on the processors, said appara- 
tus comprising: 
a process table comprising resource warmth informa- 
tion regarding each schedulable unit, each proces- 
sor being identified by a processor ID and, if said 
schedulable unit has previously executed in a pro- 
cessor, said resource warmth information compris- 
ing information indicating the processor ID of the 
processor on which the schedulable unit was previ- 
ously executed; 
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a process scheduler, said process scheduler coupled 
to said process table and to each said processor, 
said process scheduler comprising a comparator to 
compare the processor ID of a schedulable unit to 
be scheduled with the corresponding processor ID 5 
stored in the process table, such that, if the schedu- 
lable unit has executed previously, the schedulable 
unit is scheduled with the corresponding processor 
the ID of which matches the processor ID stored 
in the process table. 10 

2. The apparatus as set forth in claim 1, wherein said 
resource is a cache memory and said resource warmth 
information associated with a schedulable unit identifies 
the number of cache misses that have occurred on the 
cache associated with the identified processor, said 15 
process scheduler utilizing resource warmth informa- 
tion to determine the processor upon which the schedu- 
lable unit is to be scheduled by scheduling the schedula- 
ble unit to be scheduled with the processor if the num- 
ber of cache misses does not exceed a second predeter- 
mined number. 

3. The apparatus as set forth in claim 2, wherein the 
number of cache misses which have occurred include 
cache misses due to data in the cache associated with ^ 
another schedulable unit which cause a line of data in 
the cache to be replaced with data from another schedu- 
lable unit. 

4. The apparatus as set forth in claim 2, wherein the 
cache is divided into a plurality of regions, each region 30 
having separate cache miss information, each region 
utilized by a different group of schedulable units, 
whereby the effect of extensive use of the cache by a 
single schedulable unit is minimized. 

5. The apparatus as set forth in claim 2, wherein the 35 
resource warmth information for a schedulable unit is 
used to calculate a resource warmth value for the 
schedulable unit, said apparatus further comprising a 
total resource warmth value corresponding to the sum 
of resource warmth values for the schedulable units 
accessing a resource, wherein when the resource 
warmth value for a schedulable unit increases, the re- 
source warmth values for other schedulable units ac- 
cessing the resource decrease in proportion to their 
respective resource warmth values such that the total 45 
resource warmth value remains constant and the re- 
source warmth values more accurately reflect cache 
usage by the schedulable units. 

6. The apparatus as set forth in claim 3, wherein the 
resource associated with a processor comprises multiple 50 
levels of caches and the information indicative of cache 
warmth comprises information for each cache. 

7. The apparatus as set forth in claim 1, wherein said 
resource warmth information further comprises infor- 
mation to identify the number of time slices since the 55 
schedulable unit was last executed on the processor and 
said schedulable unit is scheduled with the processor if 
the number of timeslices does not exceed a first prede- 
termined number. 

8. The apparatus as set forth in claim 1, wherein said 60 
resource comprises a virtual memory and said resource 
warmth information further comprises information to 
identify the number of page-faults that have occurred in 
the virtual memory associated with the identified pro- 
cessor, said process scheduler utilizing that information 65 
to determine the processor on which the schedulable 
unit is to be scheduled by scheduling the schedulable 
unit to be scheduled with the processor if the number of 
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page-faults does not exceed a third predetermined num- 
ber. 

9. The apparatus as set forth in claim 1, wherein the 
schedulable unit is a process. 

10. The apparatus as set forth in claim 1, wherein the 
schedulable unit is a thread of a process. 

11. In a computer system comprising multiple proces- 
sors and at least one cache memory associated with, and 
coupled to, each processor, each processor executing in 
the context of a schedulable unit for a time slice before 
switching to execute in the context of another schedula- 
ble unit, an apparatus for efficiently scheduling schedu- 
lable units to be executed on the processors, said appara- 
tus comprising: 

8 process table comprising cache warmth information 
regarding each schedulable unit, each processor 
being identified by a processor ID and, if said 
schedulable unit has previously executed in a pro- 
cessor, said cache warmth information comprising 
information indicating the processor ID of the 
processor on which the schedulable unit was previ- 
ously executed; 
a process scheduler, said process scheduler coupled 
to said process table and to each said processor, 
said process scheduler comprising a comparator to 
compare the processor ID of a schedulable unit to 
be scheduled with the corresponding processor ID 
stored in the process table, such that, if the schedu- 
lable unit has executed previously, the schedulable 
unit is scheduled with the corresponding processor 
the ID of which matches the processor ID stored 
in the process table. 

12. The apparatus as set forth in claim 11, wherein the 
schedulable unit is a process. 

13. The apparatus as set forth in claim 11, wherein the 
schedulable unit is a thread of a process. 

14. The apparatus as set forth in claim 11, wherein 
said cache warmth information further comprises infor- 
mation to identify the number of time slices since the 
schedulable unit was last executed on the processor and 
said schedulable unit is scheduled with the processor if 
the number of timeslices does not exceed a first prede- 
termined number. 

15. The apparatus as set forth in claim 11, wherein 
said cache warmth information associated with a 
schedulable unit identifies the number of cache misses 
that have occurred on the cache associated with the 
identified processor, said process scheduler utilizing 
cache warmth information to determine the processor 
upon which the schedulable unit is to be scheduled by 
scheduling the schedulable unit to be scheduled with 
the processor if the number of cache misses does not 
exceed a second predetermined number. 

16. In a computer system comprising multiple proces- 
sors and at least one cache memory associated with 
each processor, each processor executing in the context 
of a schedulable unit for a time slice before switching to 
execute in the context of another schedulable unit, a 
method for efficiently scheduling schedulable units to 
be executed on the processors, said method comprising 
the steps of: 

providing a process table comprising information 
regarding each schedulable unit, said process table 
comprising information indicative of cache 
warmth, said cache warmth information including 
information indicative of the processor upon which 
the schedulable unit was previously executed if the 
schedulable unit was previously executed; 
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maintaining in the process table the cache warmth 
information regarding each schedulable unit; 

scheduling a schedulable unit to a processor accord- 
ing to the cache warmth such that if a schedulable 
unit to be scheduled was previously executed, the 5 
schedulable unit to be scheduled is scheduled on 
the processor upon which schedulable unit to be 
scheduled was previously executed. 

17. The method as set forth in claim 16, wherein said 
cache warmth information identifies the number of time 10 
slices since the schedulable unit was last executed on the 
processor and said schedulable unit is scheduled with 
the processor if the number of timeslices does not ex- 
ceed a first predetermined number. 

18. The method as set forth in claim 16, wherein said 15 
cache warmth information identifies the number of 
cache misses that have occurred on the cache associated 
with the identified processor, said method comprising 
steps to utilize that information to determine the proces- 
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sor upon which the schedulable unit will subsequently 
execute by scheduling the schedulable unit to be sched- 
uled with the processor if the number of cache misses 
does not exceed a second predetermined number. 

19. The method as set forth in claim 16, wherein each 
processor is identified by a processor ID and said pro- 
cess table comprises information indicating the proces- 
sor ID of the processor upon which the schedulable unit 
was previously executed, said step of scheduling com- 
prising the steps of comparing the processor ID with 
the processor ID stored in the process table, whereby 
the schedulable unit is scheduled with the processor 
having a processor ID which matches the processor ID 
stored in the process table. 

20. The method as set forth in claim 16, wherein the 
schedulable unit is a process. 

21. The method as set forth in claim 16, wherein the 
schedulable unit is a thread of a process, 
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